For this tutorial, we will be analyzing one lung adenocarcinoma sample profiled using single-cell resolution spatial transcriptomics (ST)! It is assumed that you have already installed the necessary R packages before running this tutorial. To install DIALOGUE, please see the Github Page Here.
library(DIALOGUE, quietly = T)
library(irlba, quietly = T)
library(ggplot2, quietly = T)
library(patchwork, quietly = T)
library(dplyr, quietly = T)
library(tidyr, quietly = T)
library(psych, quietly = T)
We start by loading the the spatial transcriptomics data. Once again,
the readRDS() function reads compressed RObjects and loads
them into your computing environment in R. The
ST_he2022_luad13.rds file in
/path/to/tutorial/directory/data/ contains one lung
adenocarcinoma sample from the the publicly available ST dataset from this
paper. We are providing “pre-annotated” data which includes (1) raw
counts matrix (cd) (2) gene expression matrix
(tpm) and (3) cell type annotations
(cell.types) and other cell-level meta data, (4) cell
coordinates (coord) and other spatial features.
# read in the data which includes components stored as a list
ST_data <- readRDS("../data/ST_he2022_luad13.rds")
# load colors for plotting
colors <- readRDS("../data/colorBlind.rds")
# look at the components of the data!
names(ST_data)
## [1] "cd" "tpm" "cells" "genes"
## [5] "cell.types" "coor" "spatial_frames" "comp.reads"
# the gene expression matrix is a 960 x 81236 matrix with 960 genes and 81236 cells.
dim(ST_data$tpm)
## [1] 960 81236
Before we dive into the multi-cellular programs, first let us explore/visualize the data by using the cell type annotations and cell coordinates. You can see that although the colors are different, this sample matches the samples show in the right most column of Figure 3b in the DIALOGUE paper.
# make data frame for plotting
plt_df = data.frame(cell.types = ST_data$cell.types, ST_data$coor)
# make and format plot
p = ggplot(plt_df, aes(x = X, y = Y, col = cell.types)) +
geom_point(size = 0.1) +
scale_color_manual(values = c(colors[-15], "grey", "darkgreen")) +
theme_bw() +
coord_fixed() +
guides(color = guide_legend(override.aes = list(size = 3), ncol = 2))
#print
print(p)
The way DIALOGUE formulates spatial “niches” in the tissue is by
dividing the 2D space into grids. The size of your niches is a
hyperparameter you can tune depending on the physical scale of the
user’s hypothesized multicellular programs. This information is stored
in the
spatial_frames component of ST_data
object. Below we plot the cells according to their niche ID.
# make data frame for plotting
plt_df = data.frame(spatial_frames = factor(ST_data$spatial_frames), ST_data$coor)
frame_colors = rep(colors, round(length(unique(ST_data$spatial_frames))/length(colors)))
# make and format plot
p = ggplot(plt_df, aes(x = X, y = Y, col = spatial_frames)) +
geom_point(size = 0.1) +
theme_bw() +
coord_fixed() +
scale_color_manual(values = frame_colors) +
theme(legend.position = "none")
#print
print(p)
One way that we can describe the DIALOGUE framework, is that the
algorithm considers each cell type’s single cell profiles as one
“perspective” in the tissue. We then use DIALOGUE to find out how
Therefore, as input, DIALOGUE takes in a representation of each of the
cell type’s single cell profiles e.g. gene expression (but can be any
type of data). It is recommended to provide a more compact
representation of the gene expression; for example, below we use the top
30 Principle Components (PCs) based on the PCA of each cell type (stored
as the object X for each cell type).
In this analysis, we want to discover how stromal cells (fibroblasts), macrophages, and CD4 T cells coordinate gene programs together in the tumor microenvironment. Therefore, we make a “Cell Type Object” for each of these 3 cell type to prepare for DIALOGUE analysis.
# function for running PCA and extracting PCs
get_pcs <- function(X, dims = 30){
X_colmeans <- colMeans(x = X)
irlba_out <- irlba(A = X, nv = dims, center = X_colmeans)
X_top_pcs <- irlba_out$u %*% diag(x = irlba_out$d, nrow = dims)
row.names(X_top_pcs) <- row.names(X)
colnames(X_top_pcs) <- paste0("PC_", 1:dims)
return(X_top_pcs)
}
# using DIALOGUE's "make.cell.types" function to make 3 cell type objects corresponding
# to fibroblasts, macrophages, and CD4+ T cells.
cell.type.objects <- lapply(c("fibroblast", "macrophage", "T.CD4"), function(cell.type.name){
# subset out data for one cell type
b <- ST_data$cell.types == cell.type.name # cell type boolean
cell_tpm <- ST_data$tpm[,b] # getting cell type gene expression
samples <- ST_data$spatial_frames[b] # getting cell type specific spatial frames
names(samples) <- ST_data$cells[b]
cellQ <- ST_data$comp.reads[b] # getting cell quality metrix (total reads per cell)
metadata = data.frame(ST_data[c("comp.reads", "cell.types")])[b,]
# compute the top PCs for selected cell type
pcs <- get_pcs(t(cell_tpm), dims = 30)
# make the cell type object for selected cell type
celltypeobj <- make.cell.type(
name = cell.type.name, # the name of the cell type
tpm = cell_tpm, # gene expression of only cells of selected cell type
samples = samples, # the spatial frames
X = pcs, # top PCs of the cell type expression matrix
metadata = metadata, # field for additional meta data
cellQ = cellQ, # loading cellQ
)
})
## [1] "Cell type name: fibroblast"
## [1] "Cell type name: macrophage"
## [1] "Cell type name: T.CD4"
Once you have set up the DIALOGUE cell type objects, the API of DIALOGUE wraps everything all together in one function and automatically generates results and plots.
# set up DIALOGUE parameters
param <- DLG.get.param(k = 3, # number of latent dimensions to use
results.dir = "../results/dialogue/", # directory to save the results
extra.sparse = F, # control sparsity of run
covar = "cellQ", # confounder
conf = "cellQ", # cell type confidence
spatial.flag = T, # spatial or not-spatial
pheno = NULL) # meta data fields to test
# run DIALOGUE
DIALOGUE.run(rA = cell.type.objects, # cell type objects from previous chunks
main = "ST_LUAD13", # run name
param)
Once the DIALOGUE run has completed, you can look in the results
folder specified above (which for this tutorial is:
../results/dialogue/), you will find the outputs of
DIALOGUE as either figures (*.pdf files) or
*.rds files.
Let’s take a look at the DLG.output_ST_LUAD13.rds file
and breakdown what sort of information we have! First let’s take a look
at the scores compartment which gives the MCPs’
(multicellular program’s) expression scores in each cell. We can first
check that these MCPs are truly correlated across the spatial niches
between each of the three cell types, as we’ve done so here for
MCP1.
# read in results
rslts <- readRDS("../results/dialogue/DLG.output_ST_LUAD13.rds")
# make plotting data frame that summarizes the scores for each spatial niche
df <- do.call("rbind.data.frame", rslts$scores)
df <- df %>% group_by(samples, cell.types) %>%
summarize(MCP1 = mean(MCP1)) %>%
spread(cell.types, MCP1)
## `summarise()` has grouped output by 'samples'. You can override using the
## `.groups` argument.
# plot the pairs plots, *** denotes statistical significance with p.value < 10e-3
pairs.panels(df[,-1], hist.col = "grey",breaks = 50,ellipses = F,
smooth = F,lm = T,stars = T,method = "pearson")
We can also plot each cell’s score in situ to visualize how the cells’ MCPs expression are spatially correlated. You can see below that the scores are high when all three cell types are co-localized!
# merge the coordinates with the MCP scores
df <- do.call("rbind.data.frame", rslts$scores)
row.names(ST_data$coor) <- ST_data$cells
df <- data.frame(df, ST_data$coor[df$cells,])
df$MCP1 <- -df$MCP1
# plot cell types in situ
a <- ggplot(df, aes(x = X, y = Y, col = cell.types)) +
geom_point(size = 0.1) +
scale_color_manual(values = c(colors[4:5], "grey")) +
coord_fixed() +
theme_bw() +
guides(color = guide_legend(override.aes = list(size = 3)))
# plot mcp expression in situ
b <- ggplot(df, aes(x = X, y = Y, col = MCP1)) +
geom_point(size = 0.1) +
scale_color_gradient2(low = '#009392',
mid = '#f6edbd',
high = '#A01C00') +
coord_fixed() +
theme_bw()
a
b
Now, let’s look at which genes make up the discovered multicellular
programs (MCPs). Below we look at the genes in MCP1.
Looking at these data, we might interpret that IL1
signaling may play a role in coordinating this multicellular program,
given the coordinated induction of IL1B in
macrophages, its receptor IL1R1 in fibroblasts and its
agonist IL1RN in macrophages.
# let's extract out mcp1 (multicellular program 1)
mcp1 <- rslts$MCPs$MCP1
# looking at the down component
mcp1[grepl("down",names(mcp1))]
## $fibroblast.down
## [1] "C11orf96" "CCL3" "CD44" "CD68" "CDKN1A" "CHI3L1"
## [7] "COL5A1" "CYP1B1" "DUSP1" "FCER1G" "FN1" "FOS"
## [13] "GLUL" "HILPDA" "ICAM1" "IER3" "IFITM3" "IL1R1"
## [19] "ITGA5" "ITGAX" "JUNB" "MARCO" "MIF" "MMP1"
## [25] "MMP12" "MMP14" "MT1X" "MT2A" "MYC" "NDRG1"
## [31] "NFKBIA" "OSMR" "PDGFRA" "PTGDS" "PTGES" "RARRES1"
## [37] "RARRES2" "RGS2" "SAT1" "SOD2" "SPP1" "SRGN"
## [43] "STAT3" "THBS2" "TIMP1" "TYK2" "VEGFA" "VIM"
## [49] "ZFP36"
##
## $macrophage.down
## [1] "ANXA2" "B2M" "BCL2L1" "CCL2" "CCL3" "CCL3L3"
## [7] "CCL4" "CCR1" "CD19" "CD300A" "CD68" "CDKN1A"
## [13] "CHI3L1" "CLEC5A" "CYSTM1" "DUSP1" "FABP5" "FCER1G"
## [19] "FGR" "FLT1" "FN1" "GLUL" "GPNMB" "HSP90AA1"
## [25] "ICAM1" "IL1B" "IL1RN" "ITGA5" "ITGAX" "JUN"
## [31] "JUNB" "LGALS1" "MARCO" "MIF" "MMP12" "MMP9"
## [37] "MT1X" "MT2A" "NDRG1" "NR1H3" "S100A10" "S100A6"
## [43] "S100A8" "S100A9" "SAT1" "SLC2A1" "SOD2" "SPP1"
## [49] "SRC" "SRGN" "TIMP1" "TYROBP" "VEGFA" "VIM"
## [55] "ZFP36"
##
## $T.CD4.down
## [1] "B2M" "BTG1" "C11orf96" "CCL4" "CCL5" "CHI3L1"
## [7] "COL4A2" "CXCR4" "CYP1B1" "FCER1G" "FN1" "GZMK"
## [13] "HIF1A" "HILPDA" "HSP90AA1" "IFITM1" "IFITM3" "JUNB"
## [19] "MARCO" "MIF" "MMP1" "MMP12" "MT1X" "MT2A"
## [25] "NDRG1" "PTGDS" "RARRES2" "S100A9" "SOD2" "SPP1"
## [31] "SRGN" "TIMP1" "VEGFA" "VIM" "ZFP36"
And…that’s it for now!